{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Lab 2 - Plotting NYC's shelter population, proportions, and adding dataframe columns\n", "\n", "Today we will look at our first dataset from [NYC Open Data](https://opendata.cityofnewyork.us) to see how New York's shelter population changes over time. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Getting the data\n", "\n", "The Department of Homeless Services (DHS) Daily Report dataset contains the number of individuals and families staying in the shelter system each day, starting from August 21, 2013. \n", "\n", "- Go to: [https://data.cityofnewyork.us/Social-Services/DHS-Daily-Report/k46n-sa2m](https://data.cityofnewyork.us/Social-Services/DHS-Daily-Report/k46n-sa2m)\n", "- Click on the \"View Data\" button.\n", "\n", "To keep the data set from being very large (and avoid some missing values in 2014), we will *filter* the data to only be counts from January 1, 2015 to the present. To do this:\n", "- Click on the \"Filter\" button.\n", "- On the menu that appear, click on \"Add a New Filter Condition\".\n", "- Choose \"Date of Census\" but change the \"is\" to be \"is after\".\n", "- Click in the box below and a calendar will pop up. Highlight January 1, 2015.\n", "- Click the check box to the left of the data.\n", "- It will take a few seconds (it's a large file) but the rows on the left will be filtered to be all counts after January 1, 2015. \n", "\n", "To download the file,\n", "- Click on the \"Export\" button.\n", "- Under \"Download\", choose \"CSV\".\n", "- The download will begin automatically (files are usually stored in \"Downloads\" folder).\n", "\n", "Upload your CSV file to Jupyter Hub, and open it to see that it has been downloaded correctly. (You can also do this in Excel or Text Edit before uploading the file.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting\n", "\n", "First we need to import the matplotlib and pandas packages, and tell Jupyter to display the plots. The code will be the same as in Lab 1." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we need to create a dataframe from the CSV file, and store it in a variable. We will call our variable `shelter`, but any meaningful name that starts with a letter and consists of only letters, numbers, and the underline character _ will work.\n", "\n", "Type `shelter = pd.read_csv(\"DHS_Daily_Report.csv\", parse_dates=[\"Date of Census\"])` below and run your code.\n", "\n", "Notice that this command is very similar to the one we used in Lab 1. We didn't have to include the parameter `skiprows = 5` because the data started on the first row of the file. However, we need to include the parameter `parse_dates=[\"Date of Census\"]` so that the column `Date of Census` is interpreted as dates." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check that the data frame was created correctly by displaying it on the screen.\n", "\n", "Hint: type the name of the variable storing the dataframe below" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This display is still a bit unwieldy. To see just the first five rows of `shelter`, type `shelter.head()` below and run the code." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we are going to make a line plot (as in Lab 1) of how the total number of individuals in shelter has changed over time. Therefore, our x values (horizontal axis) will be the \"Date of Census\" column and the y values will be \"Total Individuals in Shelter\" column. \n", "\n", "Can you figure out how to make the plot? If you get stuck, click on Answer below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Answer:\n", "shelter.plot(x=\"Date of Census\", y = \"Total Individuals in Shelter\")\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What do you notice about this plot? What do you think causes the large dips?\n", "\n", "What happens if you don't include the parameter `y = \"Total Individuals in Shelter\"` when making the plot? Try it below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we don't specify which column to use for the y values, the `plot()` function plots all of the columns! However, right now the legend is covering up almost all of the plot. We can move the legend outside the plot by adding `.legend(bbox_to_anchor=(1,1))` to the end of our plotting command. The `.` notation means that we are applying the function on the right of the dot to the result of the code to the left of the dot, which is the plot in this case.\n", "\n", "The full command is: `shelter.plot(x=\"Date of Census\").legend(bbox_to_anchor=(1,1))` Try it below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Look at the line for Total Individuals in Shelter in this plot. It is much flatter than in our first plot. Why? \n", "\n", "### Adding new columns to the dataframe\n", "\n", "This dataset only contains the numbers of individuals or families using the shelters. What if we want to know how the proportion of individuals who are adults has changed over time? For each day:\n", "\n", "$\\text{proportion of individuals who are adults in shelter} = \\frac{\\text{# of individuls who are adults in shelter}}{\\text{# of individuals (adults or children) in shelter}}$\n", "\n", "We can do this computation for all rows in the dataframe in one line of code. First, type `shelter[\"Total Adults in Shelter\"]` below and run the code. What does this piece of code refer to? " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This code produces (or *returns*) all the data values in the `Total Adults in Shelter` column, which are the numbers we will need for the numerators in our proportion formula. Recall the numerator is the top part of a fraction and the denominator is the bottom part of a fraction. \n", "\n", "Can you figure out how to get all the data values in the `Total Individuals in Shelter` column for the denominators?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Answer:\n", "shelter[\"Total Individuals in Shelter\"]\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now to get the proportions, the following code will do column-wise division:\n", "`shelter[\"Total Adults in Shelter\"]/shelter[\"Total Individuals in Shelter\"]`\n", "\n", "That is, for each row, the value in the `Total Adults in Shelter` column is divided by the value in the `Total Individuals in Shelter` column. \n", "\n", "Try running this code below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This code gives us all the proportions. To store the proportions in a new column in the `shelter` dataframe, we can use the following code: \n", "\n", "`shelter[\"Proportion Adults\"] = shelter[\"Total Adults in Shelter\"]/shelter[\"Total Individuals in Shelter\"]`\n", "\n", "Try running it below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What do you think the new column is called? Display the new `shelter` dataframe, and see if you were correct." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's plot the proportion of adults as the y axis and the date as the x axis. Can you figure out the code to do this? As usual the answer is hidden below if you get stuck." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Answer:\n", "shelter.plot(x = \"Date of Census\", y = \"Proportion Adults\")\n", "
\n", "\n", "How is the proportion of adults in the shelters changing over time?\n", "\n", "#### Challenges\n", "- How has the proportion of children in shelter changed over time?\n", "- How has the proportion of adults who are single men changed over time? \n", "- Choose another proportion that interests you, and plot how it has changed over time." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 2 }